F/G  12/1 


AD-A103  859 
UNCLASSIFIED 


WISCONSIN  UNIV-MADISON  MATHEMATICS  RESEARCH  CENTER 
A  DUAL  DIFFERENTIABLE  EXACT  PENALTY  FUNCTION. (U) 

AUG  81  S  -  HAN.  0  L  MANGASARIAN  DAAG29-80-C 

MRC-TSR-225? 


>0041 

NL 


MRC  Technical  Summary  Report  #2257 

A  DUAL  DIFFERENTIABLE  EXACT  PENALTY 
FUNCTION 

S.  -P.  Han  and  O.  L.  Mangasarian 


Mathematics  Research  Center 
University  of  Wisconsin— Madison 
610  Walnut  Street 
Madison,  Wisconsin  53706 

August  1981 


_?W6ceived  July  31,  1981 

O- 

o 

C3 


S 


r  i  i 

EUECTEI 
SEP  8  1981 


Approved  for  public  release 
Distribution  unlimited 


Sponsored  by 

U.  S.  Army  Research  Office 
P.  0.  Box  12211 
Research  Triangle  Park 
North  Carolina  27709 


National  Science  Foundation 
Washington,  D.  C.  20550 


81 


08  ms 


UNIVERSITY  OF  WISCONSIN  -  MADISON 
MATHEMATICS  RESEARCH  CENTER 


A  DUAL  DIFFERENTIABLE  EXACT  PENALTY  FUNCTION 

/  S.-P./Han  aori  0.  L. /Mangasarian 

,  w  /  Technical  Summary  Report.  #2257 
\  ts'  '  August  l48l  '  ; 

/'  !  ■)  .  •  • 

ABSTRACT  ,  > 

A  new  penalty  function  is  associated  with  an  inequality  constrained 
nonlinear  programming  problem  via  its  dual.  This  penalty  function  is 
globally  differentiable  if  the  functions  defining  the  original  problem  are 
twice  globally  differentiable.  In  addition,  the  penalty  parameter  remains 
finite.  This  approach  reduces  the  original  problem  to  a  simple  problem 
of  maximizing  a  globally  differentiable  function  on  the  product  space  of 
a  Euclidean  space  and  the  nonnegative  orthant  of  another  Euclidean  space. 
Many  efficient  algorithms  exist  for  solving  this  problem.  For  the  case 
of  quadratic  programming,  the  penalty  function  problem  can  be  solved 
effectively  by  successive  overrelaxation  (SOR)  methods  which  can  handle 
huge  problems  while  preserving  sparsity  features.  ,, 


AMS  (MOS)  Subject  Classifications:  90C30,  90C20 
Key  Words:  Nonlinear  programming,  quadratic  programming,  penalty 
functions,  SOR  methods 
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SIGNIFICANCE  AND  EXPLANATION 


The  problem  of  minimizing  a  function  of  several  variables 
subject  to  inequality  constraints  is  reduced  to  the  problem  of 
maximizing  a  smooth  function  subject  to  nonnegativity  constraints. 
The  latter  problem  can  be  easily  solved  by  many  known  efficient 
methods.  Very  large  quadratic  problems  can  be  solved  by  using 
successive  over-relaxation  methods  which  will  preserve  any  sparsity 
the  original  problem  may  have. 


k  ^cc.|a: 

t»n  *«*' 

. .  yU 

'  NTIS 

DTIC  TAB. 

0 

Unannounced’ 

n 

Just  if  icotiom— 

\ 

Distribution/ 

_ i 

Availability  Codes  j 

Avail  anc/ 

«r  i 

> 

Spcci  :il 

! 

1 

_ _ — 

i 

_ J 

The  responsibility  for  the  wording  and  views  expressed  In  this  descriptive 
summary  lies  with  MRC,  and  not  with  the  authors  of  this  report. 


A 


A  DUAL  DIFFERENTIABLE  EXACT  PENALTY  FUNCTION 


S.  -P.  Han  and  0.  L.  Mangasarian 

A 

1.  Introduction 

It  Is  well  known  that  exterior  penalty  functions  [6,13]  in  mathe¬ 
matical  programming  suffer  from  one  of  two  difficulties.  Either  the 
Hessian  of  the  penalty  function  becomes  ill-conditioned  as  the  penalty 
parameter  approaches  infinity  [6,20],  or  the  penalty  function  is 
nondifferentiable  [13].  There  have  been,  however,  attempts  at  obtaining 
penalty  functions  which  are  both  differentiable  and  for  which  the 
penalty  parameter  remains  finite  [8, 3, 4,1].  We  present  here  a  different 
and  an  extremely  simple  penalty  function  which,  by  taking  advantage  of 
the  structure  of  the  dual  problem,  results  in  a  penalty  function  which 
Is  differentiable  and  for  which  the  penalty  parameter  remains  finite. 

The  key  idea  behind  the  present  approach  is  extremely  simple  and  is  best 
Illustrated  by  the  following  equality-constrained  minimization  problem 

minimize  f (x)  subject  to  h(x)  *  0 
xeRn 

where  f  and  h  are  differentiable  functions  from  the  n-dimensional 
real  Euclidean  space  Rn  Into  R  and  R^  respectively.  The  classical 
exterior  penalty  problem  for  this  problem  is 

minimize  f{x)  +  §■  || h(x) |J ^ 
x*Rn  c 

where  a  is  a  positive  penalty  parameter  and  ||*||  denotes  the  2-norm. 
At  stationary  points  of  the  penalty  problem  we  have 

Vf(x)  +  a?h(x)^h(x)  *  0 

Sponsored  by  the  United  States  Army  under  Contract  No.  DAAG29-80-C-0041 . 
This  material  is  based  upon  work  supported  by  the  National  Science 
,  Foundation  under  Grants  No.  MCS-790166  and  ENG-7903881. 
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where  7f(x)  Is  the  n*l  gradient  of  f,  7h(x)  Is  the  k*n  Jacobian 
of  h  and  the  superscript  T  denotes  the  transpose.  In  order  for  this 
condition  to  approach  the  statlonarlty  conditions  for  the  minimization 
problem,  which  are 

7f(x)  +  7h(x)Tu  =  0,  h(x)  *  0 

where  u  Is  an  kxl  vector  of  Lagrange  multipliers,  the  quantity 
ah(x)  must  approach  u.  Since  h(x)  s  0,  It  turns  out  that  In  general 
a  must  approach  «.  There  are  exceptions.  For  example  If  u  *  0 
then  ol  need  not  approach  «.  This  Is  an  exceptional  case  which 
does  not  hold  In  general  for  the  original  minimization  problem. 

However,  If  we  consider  the  Wolfe  dual  [22,15]  to  an  Inequality 
constrained  minimization  problem,  then  the  optimal  Lagrange  multiplier 
associated  with  the  equality  constraint  of  the  dual  Is  zero  provided  that 
the  Hessian  of  the  Lagrangian  Is  nonsingular  at  the  optimum.  Hence  for 
the  exterior  penalty  problem  associated  with  Wolfe  dual  we  can  show 
(Theorems  1  to  4)  that  under  rather  natural  conditions  the  penalty 
parameter  remains  finite.  Hence  we  can  obtain  a  globally  differentiable 
penalty  function  with  a  finite  penalty  parameter.  Because  our  penalty 
problem  formulation  depends  In  an  essential  manner  on  the  dual  problem, 
our  results  are  local  results  In  the  absence  of  convexity,  and  become 
global  results  If  convexity  Is  assumed.  Because  our  penalty  function  Is 
smooth  and  its  parameter  Is  finite  It  has  Important  computational 
Implications.  For  example,  fast  methods  of  smooth  optimization  could  be 
used  to  directly  optimize  the  differentiable  penalty  function  (Algorithm  1), 
or  the  function  may  be  used  as  In  [12]  In  enlarging  the  convergence  region 
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of  fast  but  locally  convergent  algorithms  [9,11].  In  addition,  for 
positive  definite  quadratic  programming  problems,  our  penalty  function 
can  be  used  to  derive  a  successive  overtaxation  (SOR)  algorithm 
without  the  need  to  invert  the  underlying  positive  definite  matrix  of 
the  problem  (Algorithm  2).  SOR  algorithms  have  proved  to  be  successful 
In  solving  linear  programming  problems  [17]  and  have  the  potential  for 
solving  enormous  problems  that  cannot  be  tackled  by  pivotal  methods 
while  at  the  same  time  preserving  the  sparsity  of  the  problem. 

Besides  this  Introduction,  this  paper  contains  two  sections.  In 
Section  2  we  treat  the  general  nonlinear  programming  problem  while  In 
Section  2  we  specialize  to  the  quadratic  programming  case  to  obtain 
sharper  results.  Section  1  contains  theorems  relating  stationary  points, 
local  and  global  optima  of  the  nonlinear  inequality  constrained  problem 
to  those  of  the  penalty  problem.  We  also  give  a  simple  gradient  projec¬ 
tion  algorithm  for  optimizing  the  penalty  function.  In  Section  3  we 
have  similar  results  for  the  quadratic  programming  case.  We  also  present 
an  SOR  method  for  quadratic  programming  which  Is  a  generalization  of  the 
SOR  method  used  with  successful  computational  results  on  linear 
programming  [17], 

We  briefly  describe  our  notation.  All  vectors  In  Rn  will  be 
column  vectors  unless  transposed  to  a  row  vector  by  the  superscript  T. 
r"  will  denote  the  nonnegative  orthant  {x|xeRn,  x>0}.  For  x  In  Rn, 

x^,  1*1 . .  will  denote  Its  1th  component,  while  x+  will  denote  a 

vector  In  Rn  with  components  (x+)^  *  max  {x^,0>,  1*1, ...,n  and  ||x|| 
will  denote  the  Euclidean  norm  (x^x)*5.  For  an  m*n  real  matrix  A,  A^ 
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will  denote  the  1th  row,  A»j  the  jth  column,  and  If  Ie{l,...,m}, 

Jc{l . n}  then  Aj  will  denote  the  submatrix  with  rows  A^,  lei,  Aj 

will  denote  the  submatrix  with  columns  A*j,  jeJ,  and  Ajj  will  denote 
the  submatrix  with  elements  A^,  lei  and  jeJ.  For  a  differentiable 
function  f:Rn-*-R,  7f(x)  will  denote  the  nxl  gradient  vector,  while 
for  a  differentiable  function  g:Rn-*-Rn1,  7g(x)  will  denote  the  m*n 
Jacobian  matrix.  For  a  twice  differentiable  function  L:RnxRm-*.R, 
VxL(x,u)  will  denote  the  nxl  gradient  with  respect  to  x,  7uL(x,u) 
will  denote  mxl  gradient  with  respect  to  u,  7  L(x,u)  will  denote 
(n+m)  x  (n+m)  Hessian  with  respect  to  both  x  and  u  whose  submatrix 
components  are  denoted  as  follows 


V2L(x.u) 


VxxL(x,u)  VJ.U.U) 


VuxL(x,u)  7uuL(x,u) 


For  a  nonlinear  programming  problem  such  as  (1)  below,  a  point 
(x,u) e  Rn+m  satisfying  the  Karush-Kuhn-Tucker  conditions  (!')  is  said 
to  be  a  KKT  point,  while  x  Is  said  to  be  a  stationary  point  of  (1). 


Whenever  a  point  (x,u)  is  a  KKT  point,  the  differentiability  of  f 
and  g  at  x  is  Implicitly  assumed. 


->  ~mr 
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2.  The  General  Nonlinear  Programming  Problem 
We  consider  here  the  problem 


minimize  f(x)  subject  to  g (x )  <0  (1) 

XeRn 

where  f  is  a  function  from  the  n-dimensional  real  Euclidean  space  Rn 
into  the  reals  and  g  is  from  Rn  into  Rm.  Associated  with  this 
problem  Is  the  Wolfe  dual  [22,15] 

maximize  L(x,u)  subject  to  VvL(x,u)  *  0,  u  >  0 
(x,u)eRn+m  x 

(2) 

where  L(x,u):=  f(x)  +  uTg(x) 

Our  penalty  function  is  derived  from  (2)  by  constructing  an  exterior 
penalty  function  for  the  equality  constraints  only.  Thus  we  define  the 


penalty  function 


0(x,u,y):=  L(x,u)  -  £ ||VxL(x,u)||' 


and  consider  the  penalty  problem 


maximize  0(x,u,y) 
{x,u)cRn+m 
u  >  0 


which  is  differentiable  on  Rn  when  f  and  g  are  differentiable  on 
Rn.  We  shall  relate  various  stationary  and  solution  points  of  problems 
(1),  (2)  and  (4).  We  begin  with  a  simple  but  useful  result. 
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Theorem  1  (Equivalence  of  stationary  points  of  (1),  (2)  and  (4) > 

Let  f  and  g  be  twice  continuously  differentiable  at  x.  Then 


(a)  /x,u)  is  a  stationary 
/  point  of  (2)  and 
y^LU.ur1  exists 


(x,u)  is  a  KK' 
point  of  (1) 


/x, u)  Is  a  stationary 
point  of  (4)  for 


any  y 


l 


•1 


(x,u)  is  a  stationary 
point  of  (2)  . 


(x,u)  Is  a  KKT 
point  of  (1)  > 


*(x,u)  Is  a  stationary  \ 
point  of  (4),  y  t  0 
and  ^  Is  not  an 
pi  gen  value  of  7xxL(x,u)^ 


Proof 

The  proof  follows  directly  by  writing  the  Karush-Kuhn-Tucker  conditions 
[15]  (V),  (2')  and  (4')  for  problems  (1),  (2)  and  (4)  respectively  as  follows 

V  L(x,u)  =  0,  g(x)  <  0,  uTg(x)  =  0,  u  >  0  (V) 


For  some  vcR: 

7xL(i,5)  -  7xxL(5,u)v  =  0 
g(x)  -  Vg (x) v  <  0 
uT(g(x)  -  Vg(x)v)  =  0 
u  >  0 
VxL(x,u)  *  C) 


I 


I  ' 

■] 

]  ( 
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(I  -  yVxxL(x,u))7xL(x,u)  *  o\ 
g(x)  -  y7g(x)7xL(x,u)  <  0  \ 
uT(g(x)  -  y7g(x)7xL(x,u))  *  0  ) 

u  >  0/ 


0 


(4') 


In  the  next  result  we  establish,  under  appropriate  assumptions,  the 
local  concavity  of  8(x,u,y)  in  both  the  variables  x  and  u. 

Theorem  2  (Negative  semi  definiteness  and  definiteness  of  7  9(x,u,y)) 

Let  (x.u)  be  a  KKT  point  of  (1),  let  f  and  g  be  twice  contin¬ 
uously  differentiable  at  x  and  let  7xxl(x,u)  be  positive  definite 

with  minimum  eigenvalue  p  >  0.  Then  for  y  >,i,  (x,u)  is  a  stationary 

2  ^ 

point  of  (4)  and  the  Hessian  V  0(x,u,y)  with  respect  to  (x.u)  is  negative 

semidefinite.  If  in  addition  y  >  r-  and  7g(x)  has  linearly  independent 

2  -  -  P  - 
rows,  then  7  6(x,u,y)  is  negative  definite  and  hence  (x.u)  is  a  strict 

local  maximum  of  (4). 


Proof 

By  Theorem  1,  (x,u)  satisfies  the  KKT  conditions  (4')  for 

problem  (4).  We  have  from  (3)  when  f  and  g  are  differentiable  at  x 
that 


70(x,u,y) 


(I  -y7xxL(x,u))7xL(x,u) 
g(x)  -  y7g(x)7xL(x,u) 


(5) 


Recalling  that  7xL(x,u)  =  0  we  have  that 


I 
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726(x,u,y)  = 


V  L(x,u)(I - yV  L(x,u))  (I - y7vvL(x,u))7g(x) 


Vg(x)(I-y7xxL(x,u)) 


•y7g(x)7g(x)T 


Define 


C:=  7xxL(x,u)  and  A:=  7g(x) 


C(I-yC)  (I-yCJA1]  fc  A7]  fifl  [c  A7} 


720(x,u,y)  * 


A(I-yC) 


-yAa'I  1a  0 


and  for  y  >  ■=■  we  have  that 

-  p 


(xT  uT)72e(x,u,Y)(y)  =  xTCx  +  2xTAu  -  y||Cx+ATu||2 

-  -xTCx  +  2xT(Cx+ATu)  -  Y||Cx+ATu  || 2 

<  -p||x||2  +2||x||  || Cx+ATu || -Y ii Cx+ATu || 2 

B  -p((|x||-^||Cx+aTu||)2-  (y-|)||Cx+aTu|[2 

<  0 

Hence  7  0(x,u,y)  Is  positive  semldefinite  for  y  ^  ^  •  If  (*)  t  0  then 
we  consider  two  cases: 

Case  I:  Cx  +  ATu  i  0.  For  this  case  it  follows  from  y  >  1  that 
(xT  uT)729(J,G,y)(u)  <  0. 
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Case  II:  Cx  +  A  u  3  0  and  (u)  f  0.  For  this  case  we  have  that  x  /  0, 
else  u^A  =  0,  u  f  0,  which  contradicts  the  assumption  that  the  rows  of 
A  are  linearly  independent.  Hence 

(xT  uT)72e(x,u,y)(*)  =  -xTCx  <  0 

where  the  last  inequality  follows  from  the  assumption  that  C  is 
positive  definite. 

T  T  2  -  -  y 

Thus  in  either  case  (x  u  }7  0(x,u,y)(*)  <  0  for  (x,u)  f  0  and 
V  9(x,u,y)  Is  negative  definite  for  y  >  -r-  and  (x,u)  is  a  strict  local 
maximum  of  (4)  [6,13].  □ 

The  assumption  in  Theorem  2  that  7g(x)  has  full  row  rank  Is 
restrictive,  but  apparently  it  is  the  best  we  can  do  if  we  require  that 
7  6(x,u,y)  be  negative  definite.  A  natural  relaxation  is  to  merely  ask 
for  conditions  that  ensure  that  (x,u)  is  a  strict  local  maximum  of  (4). 
It  turns  out  that  such  a  relaxation  can  be  reflected  in  replacing  the 
linear  independence  of  the  rows  of  7g(x)  by  the  less  stringent  require¬ 
ment  of  the  linear  independence  of  the  gradients  of  the  active  constraints 
only  as  follows. 

Theorem  3  (Strict  local  maximum  of  0(x,u,y)) 

The  last  sentence  of  Theorem  2  can  be  replaced  by  the  following:  If 
In  addition  y  >  i  and  7g^(x)  are  linearly  independent  for  1eJ  where 

J  «  <1 |gi(x)*0,  i=l,...,m}  (8) 

then  (x,u)  is  a  strict  local  maximum  of  (4). 
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Proof 

Let  Aj  =  Vgj(x).  From  the  proof  of  Theorem  2,  by  replacing  A  by 

2  -  -  1 
Aj,  we  have  that  7jj6(x,u,y)  is  negative  definite  for  y  >  •=■  where 


(9) 


We  establish  now  that  (x,u)  Is  a  strict  local  maximum  by  (4)  by 
establishing  the  second  order  sufficient  optimality  condition  [6,13]. 

Note  from  (5)  that  Vu0(x,u,y)  *  g(x),  and  since  the  optimal  multiplier 
associated  with  the  nonnegativity  constraint  u  >  0  is  -Vu0(x,u,y), 
hence  the  second  order  sufficient  optimality  condition  for  (4)  is  then 


E  ■  {1 | u1*0,  g1(x)<0} 

G  -  {1 |u^*0,  gi(x)=0} 

H  «  (i |uj>0,  g^xH) 

Since  J  *  GuH  It  follows  that  the  second  order  condition  (10)  can  be 
rewritten  as 

’  x 

0  f  uQ>0 
,UH 


I 


3 


f 


Condition  (11)  is  automatically  satisfied  for  y  >  j  because  we  have 
already  established  that  Vjj0(x,u,y)  is  negative  definite  for 
T  >£•  0 

So  far  no  convexity  assumptions  have  been  made  anywhere  and  con¬ 
sequently  all  our  results  are  local  results.  We  can  globalize  some  of 
our  results  if  we  assume  that  f  is  uniformly  strictly  convex  and  g 
is  convex  on  Rn.  In  fact  we  can  show  then  that  for  each  local  solution 
(x(y),  u(y) )  of  (4) ,  x(y)  is  the  unique  global  solution  of  (1).  In 
particular  we  have  the  following. 

Theorem  4  (Stationary  points  of  (4)  as  global  solutions  of  (1)  and  (2)) 
Let  f  and  g  be  convex  and  twice  continuously  differentiable  on 
Rn,  let 

yTV2f(x)y  >,  v  |Jy |] 2  for  all  x,  yeRn  and  some  v  >  0,  (IS 

and  let  y  >  For  every  stationary  point  (x(y).  u(y))  of  (4),  x(y) 
is  independent  of  y  and  x(y)  b  x>  where  x  Is  the  unique  solution 
of  (1). 


Proof 

For  x,  y  in  Rn  and  ueRm,  u  >,  0  we  have  that 

y\xL(x,u)y  >  yV2f(x)y  >  v||y||2  (13) 

Hence  VxxL(x,u)  is  positive  definite  for  all  u  >  0  and  its  smallest 
eigenvalue  p(x,u)  satisfies  the  inequality  p(x,u)  >  v.  By  Theorem  1(b) 
every  stationary  point  (x(y)>  u (y) )  satisfies  the  KKT  conditions  (!')  of 
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(1).  Since  f  Is  strictly  convex  and  g  is  convex,  x(y)  must  equal 
the  unique  solution  x  of  (1)  and  (x,u(y))  must  solve  (2)  [15].  0 

We  note  that  problem  (4)  can  be  used  directly  to  construct  an 
algorithm  for  solving  the  original  problem.  For  example  we  can  easily 
prescribe  a  Levitin-Poljak  gradient  projection  algorithm  [14]  or  a 
superlinearly  convergent  quasi-Newton  algorithm  [10,7,9,11,21].  The  key 
observation  to  make  here  is  that  the  projection  operation  here  is  an 
extremely  simple  one,  namely  projection  on  Rn*R+.  We  give  below  the 
simplest  gradient  projection  algorithm  for  solving  (4)  and  its  convergence 
to  a  KKT  point  of  (1). 

Algorithm  1  (Gradient  projection  algorithm  for  (4)) 

Choose  y  >  0  and  any  (x°,u°) e Rn  xR+.  Having  (x^.u1)  compute 
(x1+1,ui+1)  as  follows: 

Direction  choice:  p*  *  (I-yVxxL(x^,u^))VxL(x^ ,u^ ) 

q1  -  (ui+g(x1)-YVg(xi)vxL(x1,u1))+-u1 
Stepslze  choice:  (x*+1,u*+<*)  a  (x*+xV,u*+xV) 
where  X^  Is  chosen  such  that 

8(x*+X V ,u*+X V  »y) *  max{0(x*+Xp* ,u*+Xq^ ,y) |u*+Xq*>0} 

X 

where  6  is  defined  by  (3). 

By  standard  convergence  results  [14]  and  by  Theorem  1  we  have. 

Theorem  5  (Convergence  of  Algorithm  1) 

Let  f  and  g  be  thrice  differentiable  on  Rn.  Each  accumulation  point 
(x,u)  of  the  sequence  {(x^,u^)>  generated  by  the  gradient  projection  Algo¬ 
rithm  1,  such  that  ~  is  not  an  eigenvalue  of  VxxL(x,u),  is  a  KKT  point  of  (1). 
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3.  The  Quadratic  Programming  Problem 

In  this  section  we  specialize  our  results  to  the  quadratic  programming 
problem  and  obtain  some  sharper  results.  However  the  principal  purpose  of 
this  section  is  to  describe  an  SOR  method  for  solving  the  quadratic  program¬ 
ming  problem  which  does  not  require  the  inversion  of  the  matrix  defining  the 
quadratic  term  [17].  This  should  substantially  widen  the  applicability  of 
SOR  methods  to  mathematical  programming  problems  which  have  hitherto  been 
limited  principally  to  the  minimization  of  quadratic  functions  on  the  nonnega 
tive  orthant  [16,17,18].  The  principal  advantages  of  SOR  methods  are  their 
ability  to  handle  extremely  large  problems  and  to  preserve  sparsity. 

We  shall  consider  here  the  quadratic  program 

IT  T 

minimize  ix  Cx  +  d  x  subject  to  Ax  <  b  (14) 

xcRn  c 

where  C  is  an  nxn  symmetric  matrix,  A  is  an  m*n  matrix,  d  is  in 
Rn  and  b  is  in  Rm.  The  dual  to  this  problem  obtained  from  (2)  is 

maximize  ix^Cx  +  d^x  +  u^(Ax-b)  subject  to  Cx  +  d  +  ATu=0,  u>0  (15) 

(x,u)€Rn+m  2 

We  note  in  passing  that  the  standard  quadratic  programming  dual  [5,15] 
obtained  by  substituting  from  the  equality  constraint  into  the  objective 
function  of  (15) 

maximize  -ixTCx-b^u  subject  to  Cx  + d  + A^u *  0,  u >  0  (16) 

(x,u)«Rn+m  2 

cannot  be  used  to  obtain  a  differentiable  exact  penalty  function  because 
the  optimal  multiplier  associated  with  the  equality  constraint  in  (15) 
is  zero  when  C  is  nonsingular,  whereas  it  is  equal  to  x  in  (16)  also 
when  C  is  nonsingular  [15]. 
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The  penalty  function  associated  with  (15)  Is 

<j>(x,u,y):=  |-xTCx  +  dTx  +  uT(Ax-b)  -  Jj|Cx+ATu+d||2  (17) 

and  the  associated  penalty  problem  is 

maximize  <t>(x,u,y)  (18) 

(x,u)eRn™ 
u  >  0 

We  have  as  an  Immediate  consequence  of  Theorems  2  and  3  the  following. 

Theorem  6  (Concavity  and  strict  concavity  of  <j>(x,u,y)) 

Let  C  be  positive  definite  with  minimum  eigenvalue  p  >  0.  Then 
1  2 

f°r  y  £  ■=■ »  V  <f>(x,u,y)  is  negative  semidefinite  and  hence  <f>(x,u,y) 

Is  a  concave  function  of  (x,u)  on  Rn+m.  If  in  addition  y  >  ■!•  and 
A  has  linearly  independent  rows,  then  V2<f>(x,u,y)  is  negative  definite 
and  hence  <J>(x,u,y)  is  a  strictly  concave  function  on  Rn+ffl.  If 
y  >  ^-  and  only  A^,  icJ  are  linearly  independent  where 

J  =  {i]A^x=b^,  i=l,...,m> 

and  (x,u)  is  a  KKT  point  of  (1),  then  (x,u)  is  a  strict  global 
maximum  solution  of  (18). 

Corollary  1  .Let  {xjAx^b}  be  nonempty,  let  C  be  positive  definite 
with  least  eigenvalue  p  >  0.  Then  for  each  y  >_i,  problem  (18)  is 
a  concave  quadratic  maximization  problem  which  possesses  a  solution 
(x(y)»  u(y))  with  x(y)  independent  of  y  and  x(y)  *  x  where  x  is 
the  unique  global  solution  of  (14). 


•• 


fc*  - 


With  the  help  of  the  SOR  scheme  of  [16]  we  can  solve  Iteratively  the 
quadratic  program  (18)  in  Rn*R+  and  thereby  obtain  a  solution  to  (14). 
It  will  be  convenient  for  that  purpose  to  have  the  following  expressions 


at  hand 


V(j)(x,U,Y) 


V^(x,u,y)  = 


(I-YC)(Cx+A  u+d) 


Ax  -  b  -  yA(Cx+A  u+d 


C(I-yC)  (I-YC)Af 


A(I-yC) 


An  SOR  method  for  solving  the  quadratic  program  (18)  with  relaxation 
factor  b>e(0,2)  can  be  given  as  follows  then 


*i  ^  *  xi  + - n — 1 - V*<*1  » •  -  •  *  »Xj , . . . .  *xn»u^,Y) 

1  1  (Vvv*(x’,u  ,y))1(  Xj  J  } 


j* 1 1 • « •  •  t  n 


u1.+1  *  (ut+ - ^ — J - V  <t(xi+1,u!+1, . ujtl  »uj . um»Y)) 

J  J  fv  A/v1  .I1  v))  ui  1  J  rn  + 


(7UU*(x  ^))j;j  j 


]*lf»...  fin 


We  spell  out  our  SOR  scheme  in  detail  now. 


Algorithm  2  (SOR  scheme  for  (18)) 


Choose  coe  I 


(0,2),  Y  >  "»x - lijUl',  7* 

i  (llCjir 1 J  Y 


an  eigenvalue  of  C 


and  (x°,u°) <r Rn  xrJ.  Having  (x*,u*)  compute  (x*+1,u1+1)  as  follows: 


•- ir 


'•■ft- 


4 


1..T  1, 


- - : - 72 ( ( 1 1  ” Yc i > ^  I  C-I*i  +  l  c*»x>A  u'+d)) 

Cjj"YHCjlf  °  3  la]  iaj 


ror  j  > 


j*l  •  •  t  •  I  )fl 


u]+1  *  (u|t£ - ^2(Ajx1+1-b,-YA1(cx1+1+'1i1(AT).,uJ+1+  "  (AT).  ,,u^+d)))+ 

1  J  -rl|Ai||2  3  34  i=i  .  1  *  i-j  11 


ror  j  >  1  only 

J*1  y  t  *  t  •  tin 


Remark  1 


The  only  implicit  assumption  in  Algorithm  2  is  that  Aj  ^  0, 

j=l . .  This  assumption  Imposes  no  restrictions  whatsoever,  since 

all  constraints  A^x  <  of  (14)  for  which  Aj  *  0  are  either 
inconsistent  (b^ <  0)  or  else  can  be  discarded. 

Remark  2 

Note  that  in  Algorithm  2  only  linear  arrays  are  needed  in  distinc¬ 
tion  from  rectangular  arrays.  That  is,  we  need  to  access  the  rows  and 
columns  of  C  and  A  one  at  a  time.  Thus,  if  the  problem  is  of 
enormous  size  and  very  sparse,  then  only  the  nonzero  elements  need  be 
stored,  and  this  sparsity  unlike  pivotal  algorithms  Is  never  lost. 

We  can  now  use  the  convergence  theorems  of  [16]  and  the  theorems 
of  this  paper  to  obtain  the  following  convergence  result  for  the  SOR 
Algorithm  2. 

Theorem  7  (Monotonicity  and  convergence  of  the  SOR  Algorithm  2) 

For  the  sequence  {(x^,u*)},  1*1,2,...,  generated  by  Algorithm  2 

♦(x1+1,ui+1,v)  >  $(xi,ui,y),  1-0,1, -  (t 
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and  each  accumulation  point  (x,u)  of  the  sequence  {(x^,u^)}  is  a  KKT 
point  of  the  original  quadratic  program  (14).  If  in  addition  C  is 
positive  semidefinite  then  x  is  a  global  solution  of  (14). 

Proof 

Inequality  (22)  follows  from  (9)  of  [16]  and  by  Theorem  2.1  of  [16] 
(x,u)  is  a  stationary  point  of  (17).  By  Theorem  1,  (x,u)  is  a  KKT 
point  of  (14).  When  C  is  positive  semidefinite  x  is  a  global  minimum 
solution  of  (14)  by  the  sufficiency  of  the  KKT  conditions  [15].  0 

We  note  that  Theorem  7  does  not  ensure  the  existence  of  an 
accumulation  (x,u)  of  the  sequence  {(x^.u1)}  of  Algorithm  2.  To 
ensure  that  at  least  one  accumulation  point  exists  we  need  to  impose 
some  sort  of  qualification  similar  to  that  of  Theorem  2.2  of  [16]  which 
will  ensure  the  boundedness  of  the  iterates  {(x^.u^)}  of  Algorithm  2. 

In  particular  we  have  the  following. 

Theorem  8  (Boundedness  of  the  iterates  of  the  SOR  Algorithm  2) 

Let  C  be  positive  definite  with  minimum  eigenvalue  p  >  0,  let 
A  have  linearly  independent  columns,  let  x  satisfy  the  constraint 

qualification  Ax  <  b  and  let  y  >  ■=■.  Then  the  sequence 

i  i  ^ 

{(x  ,u  )},  1=1,2,....,  generated  by  the  SOR  Algorithm  2  is  bounded 

and  lim  x*  *  x,  where  x  is  the  unique  global  solution  of  (14). 

1-KO 

Proof 

o 

By  Theorem  6  the  constant  Hessian  V  <j>(x,u,y)  defined  by  (20)  is 
negative  semidefinite.  We  shall  assume  that  the -sequence  {(x^,u^)} 


generated  by  Algorithm  2  Is  unbounded  and  exhibit  a  contradiction.  With¬ 
out  loss  of  generality  suppose  that  ||(x^,ui)||  i  0  and  {jlx^.u1 . 


Define  z:  =  (*),  M:=  V  <J>(x,u,y)  and  q:  = 


ql 

:■ 

(I-YC)d' 

q2 

-b-yAd 

,  * 

.  Then 


4>(x,u,y)  *>  $(z,y)  s  ]-zTMz  +  qTz 

It  follows  from  (22)  and  Algorithm  2  for  1=1,2 . .  that  u*  >,  0  and 


idliil  <  iillxL 
IUT  “  li*T 


1  z 
? 


1 

Pii 


J  z1 


PI* 


By  the  Bolzano-Welerstrass  Theorem  we  get  that 


/  \ 


has  an 


accumulation  point  y  on  the  unit  sphere  in  Rn+,n  satisfying  0  <  |-yTMy 
and  y  =  (?)  with  xeRn  and  ucR^.  Since  M  Is  negative  semldeflnlte 
It  follows  that  yTMy  *  0  and  hence  My  =  0.  Since  we  also  have  that 


ii4n<is^.ii!V  +  3v<av 

ii*1  ii  "  ii*1  ii  1 1?F  iuTti“ii*1ii 

It  follows  that  0  <,  qTy.  We  thus  have 

My  -  0,  qTy  >  0,  0  f  y  =  (?),  u  >  0  (23) 


or  equivalently 
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1) 


C(I-yC)  (I-yC)A 
[a(I-yC)  -yAATJ 


(g)  =  o,  qjx  +  qju  >  0,  u  >  0,  (x,u)  f  0 


(24) 


From  the  generalized  Gordan  theorem  of  the  alternative  [19]  (24)  is 
equivalent  to  either 

the  rows  of  [C(I-yC)  (I-yC)A^]  are  linearly  dependent  (25) 

or 

C(I-yC)v  +  (I-yC)ATw  =  (I-YC)d  \ 

A(I-yC)v  -  yAATw  >  -b-yAd  \  (26) 

has  no  solution  (v,w)  in 

Because  Y  >  ^  it  follows  that  I  -  yC  is  negative  definite  and  that 
C(I-yC)  is  nonsingular  which  contradicts  (25).  We  will  show  now  that 
(26)  also  leads  to  a  contradiction.  By  hypothesis  we  have  that  Ax  <  b. 
Since  the  columns  of  A  are  linearly  independent,  there  exists  a  w 
satisfying 

A^w  =  d  +  Cx 


and  hence 

Ax  =  AC-1(ATw-d)  <  b 


that  is 


AC-1d  -  AC-1ATw  +  b  >  0 


or 

AC_1((I-YC)d  -  (I-yC)ATw)  -  yAATw  >  -  b  -  yAd 


By  defining 


v  -  (I-YC)*1C"1((I-YC)d-  (I-YC)ATw) 

we  get 

C(I-yC)v+  (I-YC)ATw=  (I-YC)d 
A(I-yC)v  -  yAATw  >  -  b  -  yM 

These  last  two  relations  contradict  (26).  Consequently  the  sequence 
{(xV)}  is  bounded  and  must  have  at  least  one  accumulation  point.  For 
each  accumulation  point  (X,Q),  X  must  equal  the  unique  solution  x  of 
(14).  Since  {x^}  is  also  bounded  It  must  converge  to  x  [2].  □ 

At  this  time  we  do  not  have  any  computational  experience  for  the  SOR 
Algorithm  2  for  solving  the  general  quadratic  programming  problem  (14). 
However,  for  the  case  when  matrix  C  *  el  where  e  Is  a  positive  number 
and  y  *  ~»  the  penalty  problem  (18)  becomes 

Maximize  -  i  ||ATu+d||*  -  ebTu  (27) 

u«Rm 
u  >  0 

This  Is  precisely  the  dual  of  the  quadratic  program  perturbation  of  [17] 
associated  with  the  linear  program 

Minimize  d^x  subject  to  Ax  <  b  (28) 

xeRn 

and  which  was  solved  quite  successfully  by  the  SOR  method  proposed  here. 
Thus  for  at  least  this  special  class  of  quadratic  programs  computational 
experience  is  very  encouraging.  It  is  hoped  that  this  experience  will 
carry  over  to  the  more  general  case. 
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